Model predictive control (MPC) is an effective method for controlling roboticsystems, particularly autonomous aerial vehicles such as quadcopters. However,application of MPC can be computationally demanding, and typically requiresestimating the state of the system, which can be challenging in complex,unstructured environments. Reinforcement learning can in principle forego theneed for explicit state estimation and acquire a policy that directly mapssensor readings to actions, but is difficult to apply to unstable systems thatare liable to fail catastrophically during training before an effective policyhas been found. We propose to combine MPC with reinforcement learning in theframework of guided policy search, where MPC is used to generate data attraining time, under full state observations provided by an instrumentedtraining environment. This data is used to train a deep neural network policy,which is allowed to access only the raw observations from the vehicle's onboardsensors. After training, the neural network policy can successfully control therobot without knowledge of the full state, and at a fraction of thecomputational cost of MPC. We evaluate our method by learning obstacleavoidance policies for a simulated quadrotor, using simulated onboard sensorsand no explicit state estimation at test time.
展开▼